Embedding strategies for effective use of information from multiple sequence alignments.
نویسندگان
چکیده
We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain.
منابع مشابه
Quality assessment of multiple alignment programs.
A renewed interest in the multiple sequence alignment problem has given rise to several new algorithms. In contrast to traditional progressive methods, computationally expensive score optimization strategies are now predominantly employed. We systematically tested four methods (Poa, Dialign, T-Coffee and ClustalW) for the speed and quality of their alignments. As test sequences we used structur...
متن کاملMolecular cloning of adenylate kinase from the human filarial parasite Onchocerca volvulus
Adenylate kinases (ADK) are ubiquitous enzymes that contribute to the homeostasis of adeninenucleotides in living cells. In this study, the cloning of a cDNA encoding an adenylate kinase from the filariaOnchocerca volvulus has been described. Using PCR technique, a 281 bp cDNA fragment encoding part ofan adenylate kinase was isolated from an O. volvulus cDNA library. Use of this fragment as a p...
متن کاملIntegration of Alignment and Phylogeny in the Whole-Genome Era
OF THE DISSERTATION Integration of Alignment and Phylogeny in the Whole-Genome Era by Hongtao Sun Doctor of Philosophy in Computer Science Washington University in St. Louis, 2015 Professor Jeremy Buhler, Chair With the development of new sequencing techniques, whole genomes of many species have become available. This huge amount of data gives rise to new opportunities and challenges. These new...
متن کاملAdaptive BLASTing through the Sequence Dataspace: Theories on Protein Sequence Embedding
A major computational challenge in the genomic era is annotating structure/function to the vast quantities of sequence information now available. This problem is illustrated by the fact that most proteins lack comprehensive annotation, even when experimental evidence exists. We theorized that phylogenetic profiles provide a quantitative method that can relate the structural and functional prope...
متن کاملPROMALS3D: a tool for multiple protein sequence and structure alignments
Although multiple sequence alignments (MSAs) are essential for a wide range of applications from structure modeling to prediction of functional sites, construction of accurate MSAs for distantly related proteins remains a largely unsolved problem. The rapidly increasing database of spatial structures is a valuable source to improve alignment quality. We explore the use of 3D structural informat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Protein science : a publication of the Protein Society
دوره 6 3 شماره
صفحات -
تاریخ انتشار 1997